Tensorflow_在RNN实践中的几点理解

关于RNN的几点理解

0. LSTM

LSTM cell 单元 的输入

$X={x_1,x_2,...x_n}$, X.shape=(counts,n,...)

编号 符号 描述 shape tf/keras
1. $C_ $i-1$ cell 的单元状态,记忆细胞 (Memory Cell) 初始化,
2. $H_ $i-1$ cell 的输出值,输出隐含层(Hidden Layer) shape=(counts,1,units)
3. $x_ $i$ cell 的输入量 $x_i$.shape=(counts,1,...) 数据输入

LSTM cell 单元 的输出

编号 符号 描述 shape tf/keras
1. $C_ $i$ cell 的单元状态,记忆细胞 (Memory Cell) 初始化,
2. $H_ $i$ cell 的输出值,输出隐含层(Hidden Layer) shape=(counts,1,units)

关于门

遗忘门的功能是决定应丢弃或保留哪些信息
输入门(更新门) 用于更新细胞状态。
输出门 用来确定下一个隐藏状态的值

gate 门 范围
$\Gamma_f^ (0,1)
$\Gamma_i^ (0,1)
$\Gamma_o^ (0,1)

- 遗忘门 Forget gate

$$\Gamma_f^{\langle t \rangle} = \sigma(W_f[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f)\tag{1} $$

$b_f$ :遗忘门的偏执,在tensorflow 中 体现为:tf.nn.rnn_cell.BasicLSTMCell(num_units,forget_bias=1.0)
$h^{\langle t-1 \rangle}$: 上一个cell 的 hidden state
$W_f$: 权重 recurrent_initializer对应的权重
$x^{\langle t \rangle}$: 当下输入

参数 num_units=128 的话,对于公式 (1)

参数 维度
$h^ shape=(128,1)
$x^ shape=(28,1)
$[h^ shape=(28,1)+(128,1)=(156,1)
$W_f$ shape=(128,156)
$b_f$ shape=(128,1)

由于激活函数的存在 $0<=\Gamma_f^{\langle t \rangle}<=1$
如果 $\Gamma_f^{\langle t \rangle}==0$ 意味着 当前 LSTM cell 将忽略 上一个cell 传来的 $h^{\langle t-1 \rangle}$

如果 $\Gamma_f^{\langle t \rangle}==1$ 意味着 当前 LSTM cell 将保存 上一个cell 传来的 $h^{\langle t-1 \rangle}$

- 更新门/输入门 Update gate/Inuput gat

$$\Gamma_u^{\langle t \rangle} = \sigma(W_u[h^{\langle t-1 \rangle}, x^{{t}}] + b_u)\tag{2} $$

Similar to the forget gate, here $\Gamma_u^{\langle t \rangle}$ is again a vector of values between 0 and 1. This will be multiplied element-wise with $\tilde{c}^{\langle t \rangle}$, in order to compute $c^{\langle t \rangle}$.

- 更新细胞 Updating the cell (基于 更新门和遗忘门)

To update the new subject we need to create a new vector of numbers that we can add to our previous cell state. The equation we use is:

$$ \tilde{c}^{\langle t \rangle} = \tanh(W_c[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)\tag{3} $$

Finally, the new cell state is:

$$ c^{\langle t \rangle} = \Gamma_f^{\langle t \rangle} c^{\langle t-1 \rangle} + \Gamma_u^{\langle t \rangle} \tilde{c}^{\langle t \rangle} \tag{4} $$

tanh 对应

- 输出门 Output gate

To decide which outputs we will use, we will use the following two formulas:

$$ \Gamma_o^{\langle t \rangle}= \sigma(W_o[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)\tag{5}$$
$$ h^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle})\tag{6} $$

LSTM 变体

1. LSTMCell 实例

Keras LSTM 参数解析

tf.keras.layers.LSTM

tf.layers.Layer
|_tf.keras.layers.Layer
    |_tf.keras.layers.RNN
        |_tf.keras.layers.LSTM

__init__(
    units,
    activation='tanh',
    recurrent_activation='hard_sigmoid',
    use_bias=True,
    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros',
    unit_forget_bias=True,
    kernel_regularizer=None,
    recurrent_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    recurrent_constraint=None,
    bias_constraint=None,
    dropout=0.0,
    recurrent_dropout=0.0,
    implementation=1,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    **kwargs
)

__call__(
    inputs,
    initial_state=None, # a tensor or list of tensors representing the initial state of the RNN layer.
    constants=None,
    **kwargs
)
  1. return_sequences=False; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3))
cell_num=2
a=LSTM(cell_num)(x)
print ("cell_num is {}".format(cell_num))
sess=tf.Session()
x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]])
print ("x input counts/shape[0] is {}".format(x_instance.shape[0]))
sess.run(tf.global_variables_initializer())
h_t=sess.run(a,feed_dict={x:x_instance})
print ("h_t shape is {}".format(result.shape))

#cell_num is 2
#x input counts/shape[0] is 4
#result shape is (4, 2)

h_t 是 $h^{\langle t \rangle}$ ,$h^{\langle t \rangle} shape=()$

  1. return_sequences=True; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3))
cell_num=2
a=LSTM(cell_num, return_sequences=True, return_state=False)(x)
print ("cell_num is {}".format(cell_num))
sess=tf.Session()
x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]])
print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape))
sess.run(tf.global_variables_initializer())
H_all=sess.run(a,feed_dict={x:x_instance})
print ("H_all shape is {}".format(H_all.shape))
# cell_num is 2
# x input counts/shape[0] is 4,x input timestamp/shape[1] is 1,shape is (4, 1, 3)
# H_all shape is (4, 1, 2)

H_all 是 $np.array(h^{\langle t \rangle},h^{\langle t-1 \rangle},h^{\langle t-2 \rangle},...,h^{\langle 1 \rangle})$

  1. return_sequences=False; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3))
cell_num=2

a=LSTM(cell_num, return_sequences=False, return_state=True)(x)
print ("cell_num is {}".format(cell_num))
sess=tf.Session()
x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]])
print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape))
sess.run(tf.global_variables_initializer())
H_all, h_t, c_t  =sess.run(a,feed_dict={x:x_instance})

此时 H_all==h_t

  1. return_sequences=True; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3))
cell_num=2
a=LSTM(cell_num, return_sequences=True, return_state=True)(x)
print ("cell_num is {}".format(cell_num))
sess=tf.Session()
x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]])
print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape))
sess.run(tf.global_variables_initializer())
H_all, h_t, c_t  =sess.run(a,feed_dict={x:x_instance})

H_all 是 $np.array(h^{\langle t \rangle},h^{\langle t-1 \rangle},h^{\langle t-2 \rangle},...,h^{\langle 1 \rangle})$

经过初步调查,常用的LSTM层有Keras.layers.LSTM 和 Tensorflow.contrib.nn.LSTMCell 及 Tensorflow.nn.rnn_cell.LSTMCell ,其中后面两个的实现逻辑是一样的。

LSTMCell =
tf.contrib.rnn.BasicLSTMCell(num_units)
tf.nn.rnn_cell.BasicLSTMCell(num_units)
tf.keras.layers.RNN(
    cell,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    **kwargs
) # 推荐

2. 时间 BBTP 反向传播

对于记忆细胞 M

# h_all_1 shape=[counts,time_steps,cell_num_1]
# h_all_2 = LSTM(32, return_sequences=True,
#               return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_0)
# h_all_1 shape=[counts,time_steps,cell_num_2]
# h_all_3 = LSTM(8, return_sequences=False,
#               return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_2)
# h_all_3 shape=[counts,time_steps,cell_num_2]
# lay4 = GlobalMaxPool1D()(h_all_3)
# lay3 shape =[counts,cell_num_2]
# lay4 = tf.layers.dense(lay3, units=20, activation="relu")
# output = activation(dot(input, kernel)+bias)
# lay4 shape =[counts,units_nums]
# print(" lay4 shape is {}".format(lay4.shape))

3. 梯度消逝与爆炸


如果你觉得这篇文章对你有帮助,不妨请我喝杯咖啡,鼓励我创造更多!